期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

11.

Assessing Fit of Unidimensional Item Response Theory Models Using a Bayesian Approach 总被引：1，自引：0，他引：1

Sandip Sinharay 《Journal of Educational Measurement》2005,42(4):375-394

Even though Bayesian estimation has recently become quite popular in item response theory (IRT), there is a lack of works on model checking from a Bayesian perspective. This paper applies the posterior predictive model checking (PPMC) method ( Guttman, 1967 ; Rubin, 1984 ), a popular Bayesian model checking tool, to a number of real applications of unidimensional IRT models. The applications demonstrate how to exploit the flexibility of the posterior predictive checks to meet the need of the researcher. This paper also examines practical consequences of misfit, an area often ignored in educational measurement literature while assessing model fit. 相似文献

12.

An Approach to Evaluating the Missing Data Assumptions of the Chain and Post-stratification Equating Methods for the NEAT Design

Paul W. Holland Sandip Sinharay Alina A. von Davier Ning Han 《Journal of Educational Measurement》2008,45(1):17-43

Two important types of observed score equating (OSE) methods for the non-equivalent groups with Anchor Test (NEAT) design are chain equating (CE) and post-stratification equating (PSE). CE and PSE reflect two distinctly different ways of using the information provided by the anchor test for computing OSE functions. Both types of methods include linear and nonlinear equating functions. In practical situations, it is known that the PSE and CE methods will give different results when the two groups of examinees differ on the anchor test. However, given that both types of methods are justified as OSE methods by making different assumptions about the missing data in the NEAT design, it is difficult to conclude which, if either, of the two is more correct in a particular situation. This study compares the predictions of the PSE and CE assumptions for the missing data using a special data set for which the usually missing data are available. Our results indicate that in an equating setting where the linking function is decidedly non-linear and CE and PSE ought to be different, both sets of predictions are quite similar but those for CE are slightly more accurate . 相似文献

13.

Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments

Sandip Sinharay Matthew S. Johnson 《Journal of Educational Measurement》2024,61(1):5-46

Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a lack of guidance on how one can compute comparable scores on various versions of these assessments. In this paper, the multigroup multidimensional Rasch model is repurposed for modeling data originating from various versions of a culturally responsive assessment and for analyzing such data to compute comparable scores. Two simulation studies are performed to evaluate the performance of the model for data simulated from hypothetical culturally responsive assessments and to find the conditions under which the computed scores are accurate. Recommendations are made for measurement practitioners interested in culturally responsive assessments. 相似文献

14.

How to Compare Parametric and Nonparametric Person‐Fit Statistics Using Real Data

下载免费PDF全文

Sandip Sinharay 《Journal of Educational Measurement》2017,54(4):420-439

Person‐fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person‐fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed simulated data. This article suggests an approach for comparing the performance of parametric and nonparametric PFSs using real data. This article then shows that there is no clear winner between , a popular parametric PFS, and , a popular nonparametric statistic, in a comparison using the suggested approach. This finding is contradictory to the common finding shown by Karabatsos, Dimitrov and Smith, and Tendeiro and Meijer that is more powerful than several parametric PFSs including and . 相似文献

15.

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Gautam Puhan Sandip Sinharay Shelby Haberman Kevin Larkin 《教育实用测度》2013,26(3):266-285

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second question, three subscore estimation methods (i.e., subscore estimated from the observed subscore, total score, or a combination of both the subscore and total score) were compared. Analyses were conducted using data from six licensure tests. Results indicated that reporting subscores at the examinee level may not be necessary as they did not provide much additional information over what is provided by the total score. However, at the institutional level (for institution size ≥ 30), reporting subscores may not be harmful, although they may be redundant because the subscores were predicted equally well by the observed subscores or total scores. Finally, results indicated that estimating the subscore using a combination of observed subscore and total score resulted in the highest reliability. 相似文献

16.

First Language of Test Takers and Fairness Assessment Procedures

Sandip Sinharay Neil J. Dorans Longjuan Liang 《Educational Measurement》2011,30(2):25-35

Over the past few decades, those who take tests in the United States have exhibited increasing diversity with respect to native language. Standard psychometric procedures for ensuring item and test fairness that have existed for some time were developed when test‐taking groups were predominantly native English speakers. A better understanding of the potential influence that insufficient language proficiency may have on the efficacy of these procedures is needed. This paper represents a first step in arriving at this better understanding. We begin by addressing some of the issues that arise in a context in which assessments in a language such as English are taken increasingly by groups that may not possess the language proficiency needed to take the test. For illustrative purposes, we use the first‐language status of a test taker as a surrogate for language proficiency and describe an approach to examining how the results of fairness procedures are affected by inclusion or exclusion of those who report that English is not their first language in the fairness analyses. Furthermore, we explore the sensitivity of the results of these procedures, differential item functioning (DIF) and score equating, to potential shifts in population composition. We employ data from a large‐volume testing program for this illustrative purpose. The equating results were not affected by either inclusion or exclusion of such test takers in the analysis sample, or by shifts in population composition. The effect on DIF results, however, varied across focal groups. 相似文献

17.

Analysis of Added Value of Subscores With Respect to Classification

Sandip Sinharay 《Journal of Educational Measurement》2014,51(2):212-222

Brennan noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. One way to interpret the method is that a subscore has added value only if it has a better agreement than the total score with the corresponding subscore on a parallel form. The focus of this article is on classification of the examinees into “pass” and “fail” (or master and nonmaster) categories based on subscores. A new CTT‐based method is suggested to assess whether classification based on a subscore is in better agreement, than classification based on the total score, with classification based on the corresponding subscore on a parallel form. The method can be considered as an assessment of the added value of subscores with respect to classification. The suggested method is applied to data from several operational tests. The added value of subscores with respect to classification is found to be very similar, except at extreme cutscores, to their added value from a value‐added analysis of Haberman. 相似文献

18.

Separation and identification of thyroid autoantibodies in patients with thyroid disorders by hydrophobic column

Pranab S. Basu Ramdhan Majhi Sudip Chatterjee Sandip K Batabyal 《Indian journal of clinical biochemistry : IJCB》2000,15(2):119-123

A method has been developed to separate and identify thyroglobulin autoantibody (TgAb) and thyroid peroxidase autoantibody (TPOAb) in serum obtained from normal and autoimmune thyroid diseases using phenyl Sepharose CL-4B hydrophobic column. The protein peaks obtained from hydrophobic column were identified as TgAb and TPOAb by comparing the elution profile of commercially purified standard thyroid autoantibodies. The similarity of the inhibitory effects of eluted proteins and of standard thyroid autoantibodies on lectin concanavalin A-RBC interaction confirmed the separation of TPO-Ab and TgAb by the hydrophobic, column. The eluted fractions from the hydrophobic column were estimated by the radio immunoassay (RIA) to confirm the presence of both auto-antibodies. This hydrophobic column method offers an advantage of visual inspection of this autoantibodies by graphic representation of peak height along with their estimation in autoimmune thyroid disorders. 相似文献

19.

Comments on “A Note on Subscores” by Samuel A. Livingston

下载免费PDF全文

Sandip Sinharay Shelby J. Haberman 《Educational Measurement》2015,34(2):6-7

相似文献

20.

Audit of the Prevalence of Noncorrelation of Immunofixation with Protein Electrophoresis and Serum Free Light Chain Assays in Multiple Myeloma in a Tertiary Cancer Care Center

Chandramallika Paul Subhosmito Chakraborty S. Sugumar Ranjan Bhattacharya Sandip Rath Sarit Chakraborty 《Indian journal of clinical biochemistry : IJCB》2021,36(3):353

Multiple myeloma (MM) is diagnosed and monitored by correlating panel of test results including serum Protein electrophoresis (SPE), Immunofixation electrophoresis (IFE), serum Free Light chain (sFLC) measurements. This audit is aimed to evaluate the prevalence of non-correlation and discrepancies amongst the three investigations (SPE/IFE/sFLC) for assessment of MM. 106 MM patients were reviewed over 16 months in a tertiary cancer care center by the availability of 3 serum test results (SPE/IFE/sFLC). Patients were divided into 2 groups: group1, newly diagnosed MM patients who were yet to receive myeloma specific treatment (n = 48); and group2, already diagnosed MM patients on treatment and followup (n = 58). Treatment modalities included stem cell transplantation and standard chemotherapy regimens. Non-correlation between the three test results (IFE/SPE/sFLC) was observed (21% in group1 and 45% in group2). Three types of discrepancies were detected as follows: (a) IFE showing less number of restriction bands as compared to SPE (8.6% patients in group2); (b) SPE/IFE negative with an abnormal sFLC ratio (12.5% patients in group1 and 13.7% in group2); (c) SPE/IFE positive but normal sFLC ratio (8% in group1 and 22% in group2). To conclude, IFE may sometimes provide information that does not always correlate with either of the SPE or sFLC results due to different sensitivities, antigen–antibody interactions, or treatment. Hence, SPE plus sFLC may be more useful particularly for patients on follow-up while IFE plus sFLC may help screen the new patients. The judicious selection of the biochemical assays can effectively reduce the treatment cost in a developing country like India. 相似文献